首页> 外文OA文献 >Risk Aversion in Finite Markov Decision Processes Using Total Cost Criteria and Average Value at Risk
【2h】

Risk Aversion in Finite Markov Decision Processes Using Total Cost Criteria and Average Value at Risk

机译:基于总成本的有限马尔可夫决策过程中的风险规避   标准和风险平均值

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

In this paper we present an algorithm to compute risk averse policies inMarkov Decision Processes (MDP) when the total cost criterion is used togetherwith the average value at risk (AVaR) metric. Risk averse policies are neededwhen large deviations from the expected behavior may have detrimental effects,and conventional MDP algorithms usually ignore this aspect. We provideconditions for the structure of the underlying MDP ensuring that approximationsfor the exact problem can be derived and solved efficiently. Our findings arenovel inasmuch as average value at risk has not previously been considered inassociation with the total cost criterion. Our method is demonstrated in arapid deployment scenario, whereby a robot is tasked with the objective ofreaching a target location within a temporal deadline where increased speed isassociated with increased probability of failure. We demonstrate that theproposed algorithm not only produces a risk averse policy reducing theprobability of exceeding the expected temporal deadline, but also provides thestatistical distribution of costs, thus offering a valuable analysis tool.
机译:在本文中,当总成本标准与风险均值(AVaR)指标一起使用时,我们提出了一种在马尔可夫决策过程(MDP)中计算风险规避策略的算法。当与预期行为的较大偏差可能产生有害影响时,需要采取规避风险的策略,而常规MDP算法通常会忽略此方面。我们为基础MDP的结构提供了条件,以确保可以有效推导和解决确切问题的近似值。我们的发现是新颖的,因为以前没有将风险均值与总成本标准相关联。我们的方法在快速部署场景中得到了证明,其中,机器人的任务是在时间期限内到达目标位置,在该期限内,速度的增加与故障概率的增加有关。我们证明了所提出的算法不仅产生了规避风险的策略,减少了超过预期时间期限的可能性,而且提供了成本的统计分布,从而提供了有价值的分析工具。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号